Power-Law Based Estimation of Set Similarity Join Size
نویسندگان
چکیده
We propose a novel technique for estimating the size of set similarity join. The proposed technique relies on a succinct representation of sets using Min-Hash signatures. We exploit frequent patterns in the signatures for the Set Similarity Join (SSJoin) size estimation by counting their support. However, there are overlaps among the counts of signature patterns and we need to use the set Inclusion-Exclusion (IE) principle. We develop a novel lattice-based counting method for efficiently evaluating the IE principle. The proposed counting technique is linear in the lattice size. To make the mining process very light-weight, we exploit a recently discovered Power-law relationship of pattern count and frequency. Extensive experimental evaluations show the proposed technique is capable of accurate and efficient estimation.
منابع مشابه
Similarity Join Size Estimation using Locality Sensitive Hashing
Similarity joins are important operations with a broad range of applications. In this paper, we study the problem of vector similarity join size estimation (VSJ). It is a generalization of the previously studied set similarity join size estimation (SSJ) problem and can handle more interesting cases such as TF-IDF vectors. One of the key challenges in similarity join size estimation is that the ...
متن کاملAn Advanced State Estimation Method Using Virtual Meters
- Power system state estimation is a central component in energy management systems of power system. The goal of state estimation is to determine the system status and power flow of transmission lines. This paper presents an advanced state estimation algorithm based on weighted least square (WLS) criteria by introducing virtual meters. For each bus of network, except slack bus, a virtual meter...
متن کاملFuzzy Logic Based Life Estimation of PWM Driven Induction Motors
Pulse-width modulated (PWM) adjustable frequency drives (AFDs) are extensively used in industries for control of induction motors. It has led to significant advantages in terms of the performance, size, and efficiency but the output voltage waveform no longer remains sinusoidal. Hence, overshoots, high rate of rise, harmonics and transients are observed in the voltage wave. They increase voltag...
متن کاملGrid Impedance Estimation Using Several Short-Term Low Power Signal Injections
In this paper, a signal processing method is proposed to estimate the low and high-frequency impedances of power systems using several short-term low power signal injections for a frequency range of 0-150 kHz. This frequency range is very important, and thusso it is considered in the analysis of power quality issues of smart grids. The impedance estimation is used in many power system applicati...
متن کاملTOPOLOGICAL SIMILARITY OF L-RELATIONS
$L$-fuzzy rough sets are extensions of the classical rough sets by relaxing theequivalence relations to $L$-relations. The topological structures induced by$L$-fuzzy rough sets have opened up the way for applications of topological factsand methods in granular computing. In this paper, we firstly prove thateach arbitrary $L$-relation can generate an Alexandrov $L$-topology.Based on this fact, w...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- PVLDB
دوره 2 شماره
صفحات -
تاریخ انتشار 2009